Exploring the spatial frequency requirements of audio-visual speech using superimposed facial motion
نویسندگان
چکیده
While visually complex stimuli such as human faces contain information across a wide range of spatial frequencies, information related to specific perceptual judgements may be concentrated in distinct spatial frequency bands. For example, previous work on static face perception has shown that face recognition relies primarily on low spatial frequency information while other tasks, such as identifying facial expressions, may require higher spatial frequencies. An innovative approach to identifying such spatial frequency biases has been the use of hybrid visual stimuli: stimuli that involve the overlap of two distinct images, one of which has been spatially filtered to remove high spatial-frequency information (i.e., low-pass filtered) and another which has been filtered to remove low-frequency information (i.e., high-pass filtered) (Schyns and Oliva, 1999). By placing these two spatial-frequency portions of the image in direct competition with each other, the use of hybrid stimuli allows for the identification of spatial frequency bands that are preferentially processed by the visual system and not merely sufficient for the task.
منابع مشابه
Speech Driven MPEG-4 Facial Animation for Turkish
In this study, a system, that generates visual speech by synthesizing 3D face points, has been implemented. The synthesized face points drive MPEG-4 facial animation. To produce realistic and natural speech animation, a codebook based technique, which is trained with audio-visual data from a speaker, was employed. An audio-visual speech database was created using a 3D facial motion capture syst...
متن کاملEvent-Related Potentials Associated with Somatosensory Effect in Audio-Visual Speech Perception
Speech perception often involves multisensory processing. Although previous studies have demonstrated visual [1, 2] and somatosensory interactions [3, 4] with auditory processing, it is not clear whether somatosensory information can contribute to the processing of audio-visual speech perception. This study explored the neural consequence of somatosensory interactions in audio-visual speech pro...
متن کاملA comparison of acoustic coding models for speech-driven facial animation
This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encod...
متن کاملReal-time speech-driven face animation with expressions using neural networks
A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, au...
متن کاملSpeaker-independent 3D face synthesis driven by speech and text
In this study, a complete system that generates visual speech by synthesizing 3D face points has been implemented. The estimated face points drive MPEG-4 facial animation. This system is speaker independent and can be driven by audio or both audio and text. The synthesis of visual speech was realized by a codebook-based technique, which is trained with audio-visual data from a speaker. An audio...
متن کامل